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Identification of a divergent O-acetyltransferase gene 
oaci b from Shigella flexneri serotype lb strains 

Qiangzheng Sun 1 '*, Ruiting Lan 2 '*, Yan Wang 1 '*, Jianping Wang 1 '*, Shengli Xia 3 '*, Yiting Wang 1 , Jin Zhang 3 , 
Deshan Yu 4 , Zhenjun Li 1 , Huaiqi Jing 1 and Jianguo Xu 1 

Shigella flexneri \s a leading cause of bacterial dysentery in developing countries. Among the 15 known serotypes, four (lb, 3a, 3b and 
4b) contain a group 6 epitope due to an acetyl group connected to the 0-2 position of rhamnose III on the tetrasaccharide structure of 
the lipopolysaccharide. O-acetyltransferase encoded by a bacteriophage, Sf6, mediates the acetylation reaction. We found that the oac 
gene in serotype lb strains was very different from that in serotypes 3a, 3b and 4b strains and is herein after referred to as oaci b which 
shares with oac 88%-89% identity at the DNA level and 85% identity at the protein level. Considering that S. flexneri strains of 
serotypes 1-5 share a recent common ancestry, the divergent oac lb is more likely to have been obtained from outside S. flexneri than to 
have undergone rapid divergence from the oac gene in the other serotypes (3a, 3b and 4b) within S. flexneri. The cloned oac lb gene was 
found to perform the same acetylation function as oac. Analysis of the genomic regions flanking oac lb showed that it was present in a 
prophage on the chromosome and the organizational structure is different from that of phage Sf 6. Additionally, phage conversion assay 
showed that serotype 1 b cannot be generated by infecting serotype la strains with Sf 6. We conclude that oac lb was carried by a non-Sf 6 
phage and is uniquely present in serotype lb. 
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INTRODUCTION 

Shigella flexneri is the major pathogen responsible for bacterial dys- 
entery in developing countries. 1 It is estimated that there are 
164.7 million cases of shigellosis worldwide annually, resulting 
in 1.1 million deaths, most of which are children under the age of 
5 years. 1 A more recent study estimated approximately 125 million 
annual shigellosis cases and 14 000 related deaths in Asia. 2 In 
China, S. flexneri is the most common Shigella spp., accounting for 
up to 80% of shigellosis cases. 3-5 

Based on the O-antigen structure of the lipopolysaccharide 
(LPS), S. flexneri is currently divided into 15 serotypes. 5 ' 6 With 
the exception of serotype 6, all share a common polysaccharide 
backbone comprised of repeating tetrasaccharide units (JV-acetyl- 
glucosamine-rhamnose-rhamnose-rhamnose). Serologically, sero- 
types lb, 3a, 3b and 4b strains may cross-react with group 6 
antisera. Structural analysis also revealed that the LPS tetra- 
saccharide backbone in serotypes lb, 3a, 3b and 4b all 
contain an acetyl group connected to the 0-2 position in the 
rhamnose III, and this acetylation results in the appearance of the 
group 6 epitope. 



Currently, it is generally believed that O-antigen acetylation is 
mediated by an O-acetyltransferase (Oac) encoded by the oac gene 
carried by the temperature bacteriophage Sf6. 7 ' 8 In 1975, Gemski 
et al? first reported that phage Sf6 could be isolated from a S. flexneri 
serotype 3a strain. Clark et al? and Verma et al. & independently iden- 
tified the oac gene from phage Sf6, which is 1002 bp in size, encoding a 
protein of 333 amino acids. Sequence comparison showed that Oac 
shares homology with a variety of proteins involved in O-acetyla- 
tion. 10 ' 11 Oac is an integral membrane protein with 10 transmembrane 
segments, and Oac function is associated with residues within cyto- 
plasmic and periplasmic loops. 12 Residues R73, and R75R76 within 
cytoplasmic loop 3 are critical to the Oac function. 12 The oac gene 
cloned from phage Sf6 was shown to be capable of converting serotype 
X, Y, la and 4a to 3a, 3b, lb and 4b, respectively. 7 

In this study, we amplified and sequenced the oac gene 
from 36 serotype lb, 3a, 3b and 4b strains, and found that the serotype 
lb strains possess a divergent oac gene, herein named oac^. We further 
characterized the gene and genetic organization of the regions flanking 
oacro to show that oac^ was not introduced by the phage Sf6, but by a 
potential novel phage. 
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Table 1 Properties of S. flexneri serotype 3a, 3b, 4b and lb strains analyzed in this study 

DNA sequence 



Serotype 


Strain 


Regions of isolation 


Host 


Year of isolation 


identity to oac of Sf6 


oac genotype 


3a 


2002061 


China/Henan 


Human 


2002 


99 


oac 




2002074 


China/Henan 


Human 


2002 


99 






2002127 


China/Henan 


Human 


2002 


99 






2002133 


China/Henan 


Human 


2002 


99 






03HL12 


China/Heilongjiang 


Human 


2003 


99 






OoGSOz 


China/Gansu 


Human 


2006 


99 






06GS03 


China/Gansu 


Human 


2006 


99 






HB05 


Oh ina/l— 1 1 1 hpi 

\j 1 1 II 1 Of 1 1 U LJGI 


Hi i ma n 

1 1 U 1 [ 1 u 1 1 


2008 


100 






51575 


China/Gansu 


Human 




100 




3 b 


2002110 


China/Henan 


Human 


2002 


99 


oac 




20030363b 


China/Henan 


Human 


2003 


99 




4b 


NCTC8336 


England/London 


Monkey 


1947 


100 


oac 




NCTC8522 


England/Birmingham 


Human 


1951 


100 






NCTC8598 


England/London 


Monkey 


1953 


100 






NCTC9726 


USA/Atlanta 


— 


1955 


99 






51577 


China/Sichuan 


Human 


— 


100 




1 h 


INU 1 


Eng and/London 


Human 


iyzu 


QQ 

oy 


oaci b 




iyy /uuo 


China/Henan 


Human 


1 QQ7 

iyy/ 


QQ 






1 QQ7ni Q 

iyy /uiy 


China/Henan 


Human 


1 QQ7 

iyy/ 


■ 






1 QQ7n9fl 

iyy /uzu 


China/Henan 


Human 


1 QQ7 

iyy/ 


QQ 

oy 






1 QQ7D9 1 

iyy /uzi 


China/Henan 


Human 


1 QQ7 

iyy/ 


oy 






1 QQ7D99 

iyy /uzz 


China/Henan 


Human 


1 QQ7 


oy 






zuuou/u 


China/Henan 


Human 


ZUUo 


QQ 

oy 






ZUUoU/ 1 


China/Henan 


Human 


ZULU 


■ 
■ 






9nmn79 

ZUUoU/ z 


China/Henan 


Human 


ZUUo 


oy 






9nmn7^ 

ZUUoU/ O 


China/Henan 


Human 


ZUUo 


QQ 

oy 






2005020 


China/Henan 


Human 


2005 


89 






06HN87 


China/Henan 


Human 


2006 


89 






07GS73 


China/Gansu 


Human 


2007 


88 






07HN57 


China/Henan 


Human 


2007 


89 






09GS62 


China/Gansu 


Human 


2009 


89 






09GS70 


China/Gansu 


Human 


2009 


88 






09GS119 


China/Gansu 


Human 


2009 


88 






51572 


China/Shandong 


Human 




89 






M1250 


Australia 


Human 




89 






M1349 


Unknown 3 


Human 




89 





The source of M1349 18 is not known but not isolated from China. 



MATERIALS AND METHODS 

Bacterial strains and culture conditions 

The S. flexneri strains used for oac gene analysis in this study were 
shown in Table 1. All strains were identified biochemically using the 
Dade Behring MicroScan WalkAway 40 (Dade Behring, Hessen, 
Germany). Serotypes were confirmed using two commercial serotyp- 
ing antiserum kits: the antisera made by Denka Seiken (Tokyo, 
Japan) and the monoclonal antibodies against S. flexneri (Reagensia 
AB, Stockholm, Sweden). Ampicillin sensitive S. flexneri strains 
03XZ014 (serotype Y), NCTC9725 (serotype 4a), 05004 (serotype 
1 a) and 04SH03 ( serotype X) were used as hosts for oac gene functional 
analysis. S. flexneri strains 03HL12 (serotype 3a) and 019 (serotype 
la) 13 were used as hosts for phage Sf6 and Sfl induction, respectively. 
Four serotype X strains (014, 51580, 04SH03 and 062 ), 11 serotype Y 
strains (036, 035, 51581, 017, 03XZ014, 038, 043, 064, 065, 025 and 
026), 12 serotype la strains (51571, 019, GS30, HB31, 080, SX25, 
QH20, SX12, 05004, HN184, QH37 and AH93), 4 serotype 4a strains 
(NCTC9725, NCTC8296, NCTC7885 and 004) and 2 serotype 3b 
strains (110 and 061) were used for Sf6 infection experiments. All 
the Chinese strains were isolated from diarrheal patients. Other 



strains were obtained from National Collection of Type Cultures 
(NCTC). All strains were generally grown at 37 °C in Luria broth 
(LB) with agitation, or on LB agar. 

Oligonucleotide primers, PCR and DNA sequencing 

Primers used in this study were listed in Table 2. All primers were 
synthesized by Sangon Biotech (Shanghai, China). Unless otherwise 
stated, PCR amplification was performed using a standard protocol 
with the following thermal cycling profile: 94 °C for 5 min followed by 
30 cycles of 94 G for 30 s, 55 °C for 50 s and 72 °C for 5 min, on a 
SensoQuest LabCycler (SensoQuest, Germany). Walking PCR was 
performed using the Genome Walking PCR Kit (TaKaRa, Kyoto, 
Japan) according to the manufacturer's protocol. PCR products were 
either sequenced directly or cloned into the pMD20T TA cloning 
vector (TaKaRa, Japan) for sequencing. 

oac gene sequencing and functional analysis 

The full length of the oac gene was amplified using primer 
pair Ol (Table 2). DNA and deduced protein sequence com- 
parison was performed using BLASTn or BLASTp at NCBI 
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Table 2 Primers used in this study 


Primer pairs 


Primer sequence (5'-3') 


Target gene or fragments 


Amplicon size (bp) 


01 


01U: TCA ATC CAG GGA TAA TTT AGG CG 


oac or oac lb 


1002 




OIL: ATG CAT AAG AGC AAC TGC TTT GA 






02 


02U: TAG AAA CAG AAG CCA CTG GAG CAC C 


oac lb , promoter and terminator 


1260 




02L GCT GCG TGG AAA AGA ACT CCA CCT T 






03 


03U: CCG CCA GGA TGG TGA AAA AGA G 


yfdC-oac 


3034 




03L AGA ACG CCA GTC CAC GCA AAG G 






04 


04U: ATG CAT AAG AGC AAC TGC TTT GA 


oac-tsp 


2075 




04L GGT TTA TGG CTG GGT ATT TGA T 






05 


05U: CCG CCA GGA TGG TGA AAA AGA GC 


yfdC-tsp 


4489 




05L GGT TTA TGG CTG GGT ATT TG 







(http://www.ncbi.nlm.nih.gov/). Additionally, the fragment (nt 2540- 
3799 of accession NO. JF450728) carrying the oac gene, together with 
the putative promoter and terminator regions (nts 2615-2643 and 
3683-3698 of accession NO. JF450728, respectively) was amplified 
using primer pair 02 (Table 2). Purified products were cloned into 
pMD20T (Amp r ) vector and transformed into S. flexneri strains 
03XZ014 (serotype Y), NCTC9725 (serotype 4a), 05004 (serotype 
la) and 04SH03 (serotype X), respectively. Transformation was per- 
formed by electroporation using a Bio-Rad Gene Pulsar (BioRad, 
Hercules, CA, USA). Transformants were identified by PCR amplifi- 
cation of oac gene and serotyping by slide agglutination with both 
monovalent antisera (Denka Seiken, Japan) and monoclonal anti- 
bodies against S. flexneri (Reagensia AB, Sweden). 

Characterization of the regions flanking the oac lh gene 

Based on known arrangements of Sf6 genome in its host strains, 14 
primer pairs 03, 04 and 05 (Table 2), which are complementary to 
sequences of yfdC, oac and tsp genes, respectively, were designed for 
PCR identification of the chromosomal regions flanking oac gene in 
serotypes 3a, 3b and 4b strains. In order to identify the regions flanking 
oac lb in serotype lb, we first performed PCR walking starting from 
gene oac lb , and then used Illumina Solexa sequencing on the whole 
genome of strain 1997020. Genomic DNA was extracted from broth 
culture using a Wizard Genomic DNA Purification kit (Promega, 
Madison, WI, USA) with methods as described in their manual. A 
paired-end library was constructed, and the average length of insert 
was about 500 bp. Reads were generated with Illumina Solexa GA IIx 
(Illumina, San Diego, CA, USA) and re-assembled into scaffolds using 
SOAPdenovo (Release 1.04). Fragments carrying the gene oac lb and 
flanking sequence were extracted. Open reading frames (ORFs) were 
determined using the ORF Finder program, which is accessible 
through the National Center for Biotechnology Information (http:// 
www.ncbi.nlm.nih.gov/gorf/gorf.html), and conformed to the codon 
usage table for Escherichia coli. Searches for homologous DNA and 
protein sequences were conducted with the BLAST software against 
the non-redundant GenBank database (http://www.ncbi.nlm.nih.gov/ 
blast/blast/). Based on the DNA sequence of the oac lb carrying frag- 
ment in 1997020 (accession NO. JN377795), a series of primers were 
designed and used for overlapping PCR to confirm the genetic struc- 
tures in the other 19 serotype lb strains. 

Phages conversion assay 

Phages Sf6 and Sfl used in this study were induced, isolated and 
purified from S. flexneri strains 03HL12 (serotype 3a) and 019 (sero- 
type la), respectively, using the methods as described previously. 13 
Phage infection and identification of lysogens were performed essen- 
tially according to the methods for phage X with the following 



modifications. 15 Firstly, S. flexneri host strains were inoculated in LB 
and incubated for 3 h at 37 °C with shaking. Cells were collected 
by centrifugation when OD 600 reached 1.2 and resuspended in 
MgS0 4 (10 mM). Then 100 ul purified phage particles were added 
into 200 ul competent cells at a ratio of 1 phage to 100 cells. 
After further incubation at 37 °C for 20 min, 3 ml of semisolid agar 
(LB with 0.7% (w/v) agar) were added and mixture was laid on the 
Brain-Heart solid medium, and then incubated at 37 °C. The area of 
turbid growth was streaked for single colony isolation and serotype 
identification. 

Nucleotide sequence accession number 

The nucleotide sequences obtained in this study have been published 
in GenBank (accession NOs. JF450698-JF450729 and JN377795). 

RESULTS 

A new oac lb gene was identified from S. flexneri serotype lb strains, 
which was divergent from the oac genes in serotypes 3a, 3b and 4b 
strains 

Serotypes lb, 3a, 3b and 4 are known to contain an O-acetyl group and 
thus carry an oac gene. We sequenced the oac gene from 36 S. flexneri 
strains to determine its diversity. The oac genes in serotype 3a, 
3b and 4b strains were highly homologous; 6 strains (HB05, 51575, 
NCTC8522, NCTC8598, NCTC8336 and 51577) had an identical 
sequence, whereas the remaining 10 strains were nearly identical, 
but differed by one base (334, A— >G) from the other six strains 
(Table 1). The oac gene from phage Sf6, 9 which was derived from 
an S. flexneri serotype 3a strain was identical to the former group 
of strains. 

Sequences of the oac genes amplified from all 20 S. flexneri 
serotype lb strains were also highly homologous to each other. All 
except three strains were identical with the remaining three 
strains (07GS73, 09GS70 and 09GS119) differing by one base (783, 
C^T) (Table 1). Surprisingly, the oac gene in serotype lb was 
significantly different from that in serotype 3a, 3b and 4b strains 
(Table 1). There are 1 10 base changes in the 1002 bp gene, of which, 
72 and 38 are synonymous and non-synonymous changes respectively. 
The distribution of these changes along the gene is shown in Figure 1 A. 
We calculated the ratio of synonymous and non-synonymous 
substitution rates (K a /K s ) using a sliding window. We found that 
the ratios in 2 regions are much higher than the rest of the gene 
(Figure 1A), suggesting that these regions may have been subjected 
to diversifying selection. The three major regions (amino acid 
residues 40-57, 69-81 and 138-156) conserved among the inner 
membrane frans-acylase family proteins 10 show very low K a /K s ratio 
(Figure 1A and IB). The residues R73 and R75R76, known to be 
critical for Oac function, 12 are also conserved (Figure 1A and IB). 
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Oaclb 
Oac 



Oaclb 
Oac 



1 MHKSNCFDTARLVAAMMVLISHHYALSGQLEPYIFGFES 

111:11111 



1 mhksncfdtarlvaammvlvshhyalsgqpepylfgfesaggiaviiffsisgylis'ksairsdsfidfmakrarrifpa 



.GGIAVI IFFSI SGYLISKSAIRSDSFI DFMAKRARRIFPA 8 0 

Mill 



80 



III 



IV 



81 HVPCSVLTYFLFGWMLNNFSTEYFSHDIIRKTISSILMSQAPDADITSHLIHPGINGSLWTLPLEFMCYIITGVAVALLK 160 
81 IxjVPCSILTYFLFGWILNDFSAEYFSHDIVRKTISSIFMSQAPDADITSHLIHAGINGSLWTLPLEFLCYIITGVAVALLK 160 



V VI VII 

Oaclb 161 SGKTFIVILLIFVSLSLVE SVSENKDVIFSIPVWLYPLRGMAFFFGATMAMYEKSWNVSNVKISVTSLLAMYAYASYGKG 240 
Oac 161 NGKAFIVILLVFVSLSLIGSVSENRDVMFSI PLWLYPLRGLAFFFGATMAMYEKSWNVSNVKITWSLLAMYAYASYGKG 240 

VIII IX X 

Oaclb 241 IDYTMACYILVSFSTIAICTSFGDPLVKGRFDYSYGVYIYAFPVQQVIINTLHIGFYPSMLLSAIVVLFLAHLSWNLIEK 320 
Oac 241 IDYTMTCYILVSFSTIAICTSVGDPLVKGRFDYSYGVYIYAFPVQQVVINTLHMGFYPSMLLSAVTVLFLSHLSWNLVEK 320 

Oaclb 321 RFLTRSSPKLSLD 333 
Oac 321 RFLTRSSPKLSLD 333 



Figure 1 Comparison of protein Oac lb and Oac. (A) Plot of variation between Oac lb and Oac. Nucleotide differences were plotted across the horizontal axis with 
synonymous and non-synonymous mutations plotted as black and red vertical lines. Ticks and numbers across the horizontal axis are base positions. 
Transmembrance (TM) segments as predicted by Verma et al. are shown in yellow boxes below the gene. The critical residues for Oac function are indicated by 
green stars. The ratio of non-synonymous and synonymous substitution rate (K a /K s ) across the gene using a sliding window of 90 bases (30 codons) and overlap of 3 
bases (1 codon) is shown above the gene. (B) Pair-wise list the amino acid sequences of protein Oac lb and Oac. Lines represent amino acid residues that are identical, 
whereas dots represent amino acids that are similar. TM segments are indicated in red color. The three major regions conserved among the inner membrane trans- 
acylase family proteins are boxed. The three critical residues for Oac function are marked by asterisk. 



The serotype lb strains used in this study were isolated from 
different countries, different regions, at different times (Table 1), 
and belonged to at least two different sequence types based on multi- 
locus sequence typing of 15 house keeping genes (Sun et al, unpub- 
lished data), while the other serotypes were also from diverse sources. 
Thus, we conclude that the divergent oac gene is specific to serotype lb 
and hence, we named it oac lb . 

Oacib has the same enzymatic function as Oac 

To determine whether Oac lb performs the same O-acetylation func- 
tion as the Oac, oac lh from strain 1997020, together with its promoter 
and terminator elements were amplified and cloned into TA vector 
pMD20T and transformed into 03XZ014 (serotype Y), NCTC9725 
(serotype 4a), 05004 (serotype la) and 04SH03 (serotype X), respec- 
tively. All the transformants were found to be capable of agglutinating 
with S. flexneri group 6 antisera, and these serotype Y, X, 4a and 



la strains were converted into serotypes 3b, 3a, 4b and lb respectively. 
The agglutination intensity of the transformants showed no difference 
from that of serotype 3a, 3b, 4b and lb clinical isolates. These results 
suggest that Oac lb possesses Oac activity, mediating the acetylation of 
the O-antigen in these serotypes. 

The oflCn, gene was located in a chromosomal region characteristic 
of phage origin but different from the Sf6 phage genome 

In order to determine whether the oac^ gene is also carried by a Sf6 
phage, we performed PCR on all the 20 serotype lb strains using 
primer pairs 03 and 04 (Table 2) designed on the Sf6 genome 
sequence, and all PCR amplifications were negative, suggesting the 
region is very different. We then used genome walking PCR starting 
from the oac lb gene and obtained two fragments about 3.1 kb and 
2.3 kb up- and down-stream of the oac^ gene respectively from sero- 
type lb strain 1997020. The 5451 bp sequence encodes six complete 
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A 02U 

lb 1 EX torT \ Q/i<M >- 

03U tRNA-arg 
3a. 3b, 4b 1 | ... X ft ( 7~ 

sf6 fi7>n^-pr|- i /» 



s^c^oa- 5451 



"*D4L 



2457T 
301 

2002017 
1997020 



IS Tail ibcr Baseplate Tail tape measure Head-tail adaptor Capsid Head Portal Terminase IS tilt 



E coli, E2348/69 



Salmonella, SPB7 



2 gKHaaoa n ^sm < r 02223 1 nrT< ^27*nnrn ri<ni52ra oai 1 



Figure 2 Genomic structure of the oaCi b region. (A) Comparison of chromosomal regions flanking O-acetyltransferase gene oac lb in serotype lb strains, and oac in 
serotype 3a, 3b and 4b strains and serotype-converting phage Sf6. Regions sharing >85% sequence identity are indicated by shaded boxes. Genes coding the same 
function are shown in the same color. Key primers used in this study are marked by arrows. (B) Genomic structure of regions flanking O-acetyltransferase gene oaci b in 
serotype lb strain 1997020 and comparison with relevant regions of sequenced S. flexneri strain 301, 2457t and 2002017. The details of ORFs in strain 1997020 are 
listed in Supplementary Table SI. Genes share high homologies are shown in the same color. (C) Comparison of the genomic structure of oac lb -carrying prophage in 
serotype lb strain 1997020 with prophage genomes in Salmonella enfer/caserovar Paratyphi B strain SPB7 and E. coli 0127:H6 strain E2348/69. Genes sharing 
>40% identity at amino acid level between the strains are marked by color, red, >80% identity; yellow, 60%-80%; blue, 40%-60%. 



ORFs and two incomplete ORFs (Figure 2A). Upstream of oac lb are 
two housekeeping genes (torR and torT) and an IS element (IS1), 
whereas downstream of oac lb are three genes of high identity 
to phage or prophage genes, which are non-homologous to any of 
the genes in phage Sf6 (Figure 2A), suggesting the presence of a novel 
prophage. In order to obtain the entire sequence of the prophage, the 
whole genome of strain 1997020 was sequenced using Illumina Solexa 
sequencing technology. A total of 4 637 796 reads were generated to 
reach about 1 10-fold coverage and these were assembled de novo into 
280 contigs ( > 1000 bp). A contig of 30 kb carrying oac lb and flanking 
regions was identified and a total of 38 ORFs (including one pseudo- 
gene (ORF36)) were able to be predicted by the ORF finder 
(Supplementary Table SI and Figure 2B) (accession NO. JN377795). 

DNA and protein level analyses of the 38 ORFs found that ORF1 
and ORF2 subsume genes torR and torT, while ORF36, ORF37 
and ORF38 are genes ycmA, yccZ and ept, respectively, which are 
present in the genomes of sequenced S. flexneri strains 301, 2457t 
and 2002017. The sequence between ORF3 to ORF35 has non- 
homologus sequences among forTand ycmA in the genomes of 301, 
2457t and 2002017 (Figure 2B). The insertion of ORF3 to ORF35 has 
been accompanied by deletion of torS to part of the ORF36 (ycmA), 
which has its 5' 935-bp region truncated, thereby resulting in a pseu- 
dogene (Figure 2B). 

BLASTp analysis found that most of the proteins encoded by the 33 
ORFs (ORF3-ORF35) are similar to bacteriophage proteins except for 
9 ORFs, whose functions are unknown. These ORFs are ORF6 (tail 
fiber), ORF7 (tail protein), ORF9 (baseplate protein), ORF11 (base- 
plate assembly protein), ORF17 (tail tape measure protein), ORF18 
(structural protein), ORF21 (head-tail adaptor), ORF23 (structural 
protein), ORF25 (capsid), ORF27-ORF28 (head), ORF29 (portal), 



ORF30 (terminase large subunit) and ORF31 (terminase small sub- 
unit) (Supplementary Table SI and Figure 2B). Two putative insertion 
sequences, IS1 (ORF3 and ORF4) and IS911 (ORF32-ORF34), were 
located downstream torT and ORF31, respectively (Supplementary 
Table SI and Figure 2B). The sequences of IS1 and IS911 are identical 
to the IS sequences found in the S. flexneri genomes. However, there 
are many copies of IS1 and IS91 1 in the genome, making it difficult to 
draw any inference of their origins. It should be noted that an IS91 1 is 
also present in the Sf6 genome. However, that IS was located in the Nin 
region but not the virion head domain as was found here. ORF35 is an 
integrase sharing 99% amino acid identity with an integrase of phage 
HK022 (Supplementary Table SI) and may have played a role in the 
integration of the bacteriophage. These data clearly indicate that this 
segment of DNA carrying oac lb originated from a phage. Since no 
phage genes for recombination, immunity, replication and lysis were 
found, this sequence represents an incomplete prophage genome. 
Attempts to induce the phage from all 20 serotype lb strains available 
in our collection were unsuccessful using conditions described by 
Mavris et al. 16 We also performed overlapping PCR amplification to 
show that the genomic organization of this prophage region is similar 
among the serotype lb strains. 

Apart from two regions of homology — the gene oac and IS911 as 
described above, the DNA sequence and gene organization of oac lb - 
carrying prophage is entirely different from that of phage Sf6 and 
contains no remnants of Sf6 phage genes (Figure 2A and 2B). Thus, 
we can conclude that the oac lb -carrying prophage in serotype lb 
strains had a non-Sf6 phage origin. Sequence analysis also indicates 
that this prophage remnant is not homologous to any of other known 
bacteriophages. However, continuous similarity of genes was found 
with prophage regions of Salmonella enterica serovar Paratyphi B 
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strain SPB7 and E. coli 0127:H6 strain E2348/69, respectively, as 
shown in Figure 2C. 

Current studies show that all the serotype-converting phages inte- 
grate into host chromosome at two conserved positions, tRNA-thrW 
for Sfl, SfH, SflV, SfV and SfX 17 and tRNA-argW for Sf6. 14 Once 
integrated, the int and O-antigen modification genes are located at 
opposition ends of the phage DNA, ending with attL and attR sites. 17 
To confirm that the Sf6 was inserted into the tRNA-argW in non- 
serotype lb strains, a fragment of 4489 bp (accession NO. JF450729) 
including the oac gene was obtained from strain 03HL12 (serotype 3a) 
by PCR using primer pairs 05 (Table 2). The sequence and gene 
structure downstream tRNA-argW were identical to that of Sf6 and 
as expected from the structural organization of Sf6 (Figure 2A). The 
insertion site of the presumable oac lb -carrying phage is unusual. It has 
apparently inserted between gene torT and yanA, resulting in the 
deletion of genome region between gene torT and ycmA, including 
part of ycmA (1-935 bp) (Figure 2B). 

Phage Sf6 is unable to infect and convert serotype la strains 

We induced the Sf6 phage from serotype 3a strain 03HL12 and used it 
to infect the four serotypes (X, Y, la and 4a) that are expected to be 
convertible based on O-antigen structure. All four serotype X strains 
and 11 serotype Y strains tested can be converted into 3a and 3b, 
respectively, as shown previously. 7-9 Interestingly, four serotype 4a 
strains were also converted to serotype 4b, which is contrary to pre- 
vious reports that serotype 4a cannot infected by Sf6. 7 However, all 12 
serotype la strains tested cannot be infected and converted into 
serotype lb by phage Sf6 (Figure 3). Similar phenomenon was 
also observed by Clark et al7 Serotype 3b strains contain only 
an O-acetyl group connected to the 0-2 position in the rhamnose 
III of the tetrasaccharide of the O-antigen. Theoretically, serotype 
3b can be converted to serotype lb by adding a glucosyl group to 
the N-acetylglucosamine of tetrasaccharide, which is mediated by 
the gtrl genes carried by phage Sfl. We tested two serotype 3b strains 
by infecting them with phage Sfl, but no serotype lb convertants 
were found (Figure 3). The non-conversion cannot be attributed to 
the phage since the same stock was previously used successfully to 
infect serotype X strains which was converted to serotype Id. 13 

DISCUSSION 

The divergent oac lh is more likely to have been obtained from outside 
S. flexneri than to have undergone rapid divergence from the oac 
gene in the other serotypes (3a, 3b and 4b) within S. flexneri. 
Previous studies have shown that S. flexneri strains of serotypes 1 to 



5 arose as an independent lineage from within E. coli recently and there 
is very low level of variation in house keeping genes. 18 The virulence 
plasmid carried by these strains also showed high levels of similarity 
among the serotype 1-5 strains. 19 Thus, it seems less probable that 
oflCib was evolved from the oac gene within S. flexneri given the high 
level of divergence. 

Clark et al? had previously showed that the oac gene cloned from 
Sf6 was capable of converting Y, X, 4a and la to 3b, 3a, 4b and lb, 
respectively. Thus the oac and oac^, genes are functionally inter- 
changeable despite the high level of sequence variation. 

The host range for Sf6 was proposed to be restricted to strains with a 
group 3;4 antigen of the O-polysaccharide chain which is presumably 
recognized by the phage tail protein TSP. 20 Hydrolysis by TSP of the 
1,3-a-linkage between rhamnose II and rhamnose III exposes the host 
cell membrane to phage DNA to allow entry into host and complete 
lysogenic conversion. 21 ' 22 Since all four serotypes (X, Y, la and 4a) 
carry group 3;4 antigen, there must be additional antigenic difference 
that render serotype la resistant to Sf6 infection. 

The putative insertion site of the likely ouc lb -carrying phage 
appeared to be unusual. It had apparently inserted between gene 
torT and ycmA, resulting in the deletion of genome region between 
gene torTand ycmA including 936 bp of the ycmA gene. The toroperon 
which encodes the trimethylamine N-oxide respiratory system is 
apparently nonfunctional in S. flexneri as torD, torA and torS are 
known pseudogenes in S. flexneri 2a strains Sf301, 2457T and 
2002017. ymcA, which encode a putative outer membrane lipoprotein, 
which is highly conserved among Shigella and E. coli. The effect of 
inactivation of the ycmA gene in lb strains is not clear. It should be 
noted that neither the tRNA genes, nor the att site sequence, was found 
in this region of lb strains. 

Our data clearly indicated that the DNA carrying oac lh originated 
from a phage; however, since no phage genes for recombination, 
immunity, replication and lysis were found, this sequence appears to 
represent an incomplete prophage genome. Our attempts to induce 
the phage, using conditions described above, from all of the 20 sero- 
type lb strains available in our collection, proved to be unsuccessful. 
Additionally overlapping PCR amplification showed that the genomic 
organization of this prophage region is similar among all the serotype 
lb strains. 

Of significance, we found that the oac lh gene mediating the 
O-antigen acetylation in the S. flexneri serotype lb strains was highly 
divergent different from the oac gene in serotype 3a, 3b and 4b 
strains and phage Sf6. We have shown that oac lh was likely part of 
a prophage and had a non-Sf6 phage origin. In comparison to the 
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Figure 3 Schematic diagram of sensitivity of S. flexneri serotypes (la, X, Y and 4a) to serotype-converting phage Sf6 and serotype 3b to serotype-converting phage Sfl. 
Numbers of strains tested were showed in parentheses. 
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Sf6-like genomic structure in serotype 3a, 3b and 4b strains, the 
organization of the prophage carrying oac^ in serotype lb chromo- 
some is rather unique. The ancestral phage was apparently inserted 
between genes torT and ymcA, however, with no typical phage attach- 
ment sites found. Sf6 infection experiments showed that the LPS of 
serotype la must contain additional changes to render it resistant to 
Sf6 infection. This study extends the current understanding of sero- 
type conversion in 5. flexneri and helps identify mechanisms that give 
rise to Shigella serotype evolution and diversity. 
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